BagBoosting for tumor classification with gene expression data
نویسنده
چکیده
MOTIVATION Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting. RESULTS When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data. AVAILABILITY Software for the modified boosting algorithms, for benchmark studies and for the simulation of microarray data are available as an R package under GNU public license at http://stat.ethz.ch/~dettling/bagboost.html.
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملEvaluation of PRR11 gene expression changes and its relationship with tumor size in patients with gastric adenocarcinoma
Introduction: Gastric cancer is one of the most common gastrointestinal tract neoplasms. Because of its invasion, and nonspecific symptoms and signs, the disease is often diagnosed at an advanced stage with short survival. PRR11 participates in the initiation and progression of lung cancer and breast cancer by regulating important genes involved in cell cycles and tumorigenesis. In this researc...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملSTUDY OF HMGA2 GENE INHIBITION WITH SPECIFIC SHRNA AND SIRNA AND INVESTIGATION OF CORRESPONDING EFFECTS ON DOWNSTREAM GENE EXPRESSION IN MDA-MB-231 CANCER CELLS: A BIOINFORMATIC AND EXPERIMENTAL STUDY
Background & Aims: The use of siRNA to silence gene expression is increasingly expanding today. The aim of this study is to bioinformatically and experimentally investigate the inhibition of the HMGA2 gene and its corresponding effects on downstream genes expression rate in MDA-MB-231 cancer cell treated by shRNA and siRNA specific to HMGA2. Materials & Methods: To perform this bioinformatic a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 18 شماره
صفحات -
تاریخ انتشار 2004